Modeling online word segmentation performance in structured artificial languages
نویسندگان
چکیده
Lexical dependencies abound in natural language: words tend to follow particular words or word categories. However, artificial language learning experiments exploring word segmentation have so far lacked such structure. In the present study, we explore whether simple inter-word dependencies influence the word segmentation performance of adult learners. We use a continuous testing paradigm instead of an experimentfinal test battery to reveal the trajectory of learning and to allow detailed comparison with three computational models of word segmentation. Adult performance on languages with dependencies is equal or lower to those without. Of the models tested, all perform worse on languages with dependencies, though a novel particle filter-based lexical segmentation model produces learning curves most similar to human subjects.
منابع مشابه
A Handwritten Uygur Character String Recognition Method Combining Segmentation and Whole Word Recognition
As one of the official languages used in the Xinjiang Uygur Autonomous Region, researches on its handwriting recognition technology still lag behind, and the input method still stays in the keyboard code stage. Based on the previously developed UCpen2.0 handwriting sample database, this paper propose an algorithm that combines the whole-word recognition and segmentation recognition of Uygur; an...
متن کاملText classification in Asian languages without word segmentation
We present a simple approach for Asian language text classification without word segmentation, based on statistical -gram language modeling. In particular, we examine Chinese and Japanese text classification. With character -gram models, our approach avoids word segmentation. However, unlike traditional ad hoc -gram models, the statistical language modeling based approach has strong information...
متن کاملStatistical Speech Segmentation and Word Learning in Parallel: Scaffolding from Child-Directed Speech
In order to acquire their native languages, children must learn richly structured systems with regularities at multiple levels. While structure at different levels could be learned serially, e.g., speech segmentation coming before word-object mapping, redundancies across levels make parallel learning more efficient. For instance, a series of syllables is likely to be a word not only because of ...
متن کاملVietnamese Word Segmentation
Word segmentation is the first and obligatory task for every NLP. For inflectional languages like English, French, Dutch,.. their word boundaries are simply assumed to be whitespaces or punctuations. Whilst in various Asian languages, including Chinese and Vietnamese, whitespaces are never used to determine the word boundaries, so one must resort to such higher levels of information as: informa...
متن کاملModeling human performance in statistical word segmentation.
The ability to discover groupings in continuous stimuli on the basis of distributional information is present across species and across perceptual modalities. We investigate the nature of the computations underlying this ability using statistical word segmentation experiments in which we vary the length of sentences, the amount of exposure, and the number of words in the languages being learned...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012